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Abstract — We  design  opportunistic  spectrum  access  strategies 
for  improving  spectrum  efficiency.  In  each  slot,  a  secondary 
user  chooses  a  subset  of  channels  to  sense  and  decides  whether 
to  access  based  on  the  sensing  outcomes.  Incorporating  the 
secondary  user’s  residual  energy  and  buffer  state,  we  formulate 
this  sequential  decision-making  problem  as  a  partially  observable 
Markov  decision  process  (POMDP).  Within  the  POMDP  frame¬ 
work,  we  obtain  stationary  optimal  sensing  and  access  policies. 
By  exploiting  the  rich  structure  of  the  underlying  problem, 
we  develop  monotonicity  results  for  the  optimal  policies,  which 
accelerate  the  computations.  Numerical  results  are  provided  to 
study  the  impact  of  the  secondary  user’s  packet  arrival  rate  and 
residual  energy  on  the  optimal  sensing  and  access  decisions. 

I.  Introduction 

Opportunistic  spectrum  access  (OSA)  is  one  of  the  ap¬ 
proaches  envisioned  for  dynamic  spectrum  management  [1],  It 
has  received  increasing  attention  due  to  its  compatibility  with 
the  current  spectrum  management  policy  and  legacy  wireless 
systems.  The  basic  idea  of  OSA  is  to  allow  secondary  users 
to  search  for  and  exploit  local  and  instantaneous  spectrum 
availability  in  a  non-intrusive  manner.  Correspondingly,  basic 
design  components  of  OSA  include  1)  a  sensing  strategy  that 
specifies  whether  to  sense  and  where  in  the  spectrum  to  sense 
and  2)  an  access  strategy  that  determines  whether  to  access 
based  on  the  sensing  outcomes. 

Related  Work  The  design  and  implementation  of  OSA  have 
been  addressed  in  the  literature  [2]-[7].  In  [2],  the  authors 
address  the  implementation  of  OSA  in  an  ad  hoc  secondary 
network  overlaying  a  GSM  cellular  network.  In  [3],  optimal 
distributed  MAC  protocols  are  proposed  within  the  framework 
of  partially  observable  Markov  decision  process  (POMDP). 
The  proposed  protocols  ensure  synchronous  hopping  of  the 
secondary  transmitter  and  receiver  in  the  spectrum  without 
introducing  extra  control  message  exchange.  More  recently, 
[4]  exploits  the  channel  fading  in  the  design  of  OSA  for 
an  efficient  use  of  secondary  users’  energy.  In  [5],  a  sep¬ 
aration  principle  is  established  for  the  optimal  joint  design 
of  the  physical  layer  spectrum  sensor  and  the  MAC  layer 
sensing  and  access  policies.  In  [6],  access  strategies  for  a 
slotted  secondary  user  searching  for  opportunities  in  an  un¬ 
slotted  primary  network  is  considered,  where  a  round-robin 

"This  work  was  supported  in  part  by  the  Army  Research  Laboratory 
Collaborative  Technology  Alliance  on  Communication  and  Networks  under 
Grant  DAAD19-01-2-001 1  and  by  the  National  Science  Foundation  under 
Grants  CNS-0627090  and  ECS-0622200. 


single-channel  sensing  scheme  is  used.  Modeling  of  spectrum 
occupancy  has  been  addressed  in  [7].  Measurements  obtained 
from  spectrum  monitoring  test-beds  demonstrate  the  Makovian 
transition  between  busy  and  idle  channel  states  in  wireless 
LAN.  For  an  overview  on  recent  developments  in  OSA  and  a 
survey  of  other  dynamic  spectrum  access  approaches,  readers 
are  referred  to  [8]. 

Contributions  This  paper  extends  [4]  by  incorporating  both  the 
bursty  traffic  and  the  energy  constraint  of  secondary  users  into 
OSA  design.  We  consider  a  secondary  network  whose  users 
independently  and  selfishly  seek  spectrum  opportunities  in  a 
slotted  primary  network.  We  formulate  the  sequential  sensing 
and  access  decision-making  of  a  secondary  user  as  a  POMDP 
problem,  which  takes  into  account  the  channel  fading,  the 
residual  energy  as  well  as  the  buffer  state  of  the  secondary 
user.  We  show  that  this  POMDP  terminates  in  a  finite  but 
random  time.  The  optimal  sensing  and  access  strategies  are 
thus  given  by  the  stationary  optimal  policies  of  this  POMDP. 

By  exploiting  the  rich  structure  of  the  underlying  problem, 
we  then  develop  monotonicity  results  for  the  optimal  policies. 
In  particular,  we  show  that  for  the  one-channel  case,  the 
optimal  sensing  policy  is  a  threshold  policy:  the  secondary 
user  with  packets  to  transmit  should  sense  a  channel  if  and 
only  if  (iff)  the  conditional  probability  that  this  channel  is 
available  is  above  a  certain  threshold.  Moreover,  the  optimal 
access  policy  is  also  a  threshold  policy:  the  secondary  user 
should  transmit  over  an  idle  channel  iff  the  channel  fading 
level  is  below  a  certain  threshold.  These  monotonicity  results 
can  help  us  accelerate  the  calculation  of  the  optimal  sensing 
and  access  policies. 

Finally,  we  provide  numerical  results  to  study  different 
factors  that  affect  the  optimal  sensing  and  access  decisions.  We 
find  that  the  impact  of  the  secondary  user’s  residual  energy  and 
buffer  state  on  the  optimal  decisions  diminishes  as  the  residual 
energy  increases.  We  also  see  the  benefit  of  sensing  a  channel 
even  if  the  secondary  user  does  not  have  any  packets  to  send. 

II.  System  Model 
A.  Primary  Network  Model 

We  consider  a  spectrum  consisting  of  N  channels,  each 
with  bandwidth  Wn  (n  =  l,--  -  ,N).  These  N  channels 
are  licensed  to  a  slotted  primary  network.  Let  Sn(t)  € 
{0  (busy),  1  (idle)}  denote  the  occupancy  of  channel  n  by 
the  primary  network  in  slot  t.  We  assume  that  the  spectrum 
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occupancy  state  (SOS)  S(f)  =  [Si(f Sjv(i)]  follows  a 
time-homogeneous  discrete  Markov  process  with  state  space 
S  defined  as 

S={0, 1}^,  where  |<S|  =  2^.  (1) 

The  transition  probabilities  are  denoted  as 

Ps(s,|s)  =  Pr{S(f)  =  s'  |  S(i  —  1)  =  s},  s,s'  €5,  (2) 

which  represents  the  probability  that  the  SOS  transits  from  s 
to  s'  at  the  beginning  of  slot  t.  The  transition  probabilities  of 
the  SOS  are  determined  by  the  statistics  of  the  primary  traffic. 
We  assume  that  they  are  known  or  have  been  learned. 

B.  Secondary  Network  Model 

Consider  an  overlay  ad  hoc  secondary  network  whose  users 
seek  instantaneous  spectrum  opportunities  in  these  N  chan¬ 
nels.  At  the  beginning  of  each  slot,  a  secondary  user  chooses 
M  (1  <  M  <  N )  channels  to  sense  and  determines  whether 
to  access  based  on  the  sensing  outcomes.  The  secondary  user 
can  also  turn  to  the  sleeping  mode  in  which  no  channel  will 
be  sensed  or  accessed  in  this  slot.  The  sequence  of  operations 
performed  by  the  secondary  user  in  each  slot  is  illustrated  in 
Fig.  1  and  will  be  detailed  in  Section  III-B. 

• -  Slot  t 

Decision  Data  Transmission  Info. 

Making  (Reward  Accumulation)  Update 

Sensing  Sensing  Access  A(f)  — ►  A(t  +  1) 

Decision  Observ.  Decision  B(t)—*B(t+l) 

A(t)  9 (t) _ ®e(t)  E(t)  — »  E(t  +  1) 

Fig.  1.  The  slot  structure.  The  secondary  user's  knowledge  of  the  SOS  is 
characterized  by  A (t),  and  its  buffer  state  and  residual  energy  are  denoted 
by  B(t)  and  E(t).  respectively. 

Our  goal  is  to  design  the  optimal  OSA  strategy  for  the 
secondary  user,  which  sequentially  specifies  which  channels 
to  sense  and  whether  to  access.  The  design  objective  is  to 
maximize  the  throughput  of  the  secondary  user  during  its  bat¬ 
tery  lifetime.  For  ease  of  presentation,  we  assume  M  =  1  ( e.g 
the  case  of  single  carrier  communications).  Our  formulation 
can  be  generalized  to  M  >  1. 

Traffic  Model  We  model  the  bursty  traffic  of  the  secondary 
user  as  a  Poisson  process1  with  rate  A.  That  is,  the  probability 
of  to  packet  arrivals  in  a  slot  is  given  by 

a  e  A 

qm= - ,  to  =  0,1,....  (3) 

to! 

The  transmission  time  of  a  packet  is  assumed  equal  to  the  slot 
length.  We  assume  that  the  secondary  user  has  a  finite  buffer 
with  maximum  size  l.  Packets  are  dropped  when  the  buffer 
overflows.  Let  B(t)  £  B  denote  the  number  of  packets  in  the 
secondary  user’s  buffer  at  the  beginning  of  slot  t,  where  B 
contains  all  possible  buffer  states: 

B={0  (empty),  1, (4) 

Channel  Model  We  adopt  a  block  channel  fading  model. 
Specifically,  we  assume  that  the  channel  gain  between  the 

Our  formulation  can  be  readily  extended  to  the  case  where  the  secondary 
user’s  packet  arrivals  follow  a  Markov-modulated  Poisson  process  (MMPP). 
See  Section  III-B  for  details. 


secondary  user  and  its  destination  is  a  random  variable  (rv) 
identically  and  independently  distributed  (iid)  across  slots  but 
not  necessarily  iid  across  channels. 

Energy  Model  The  secondary  user  is  powered  by  a  battery 
with  initial  energy  £$.  We  consider  three  types  of  energy 
consumption  by  the  secondary  user  in  a  slot.  Let  ep  denote 
the  energy  consumed  in  the  sleeping  mode  and  es  the  energy 
consumed  in  sensing  the  occupancy  of  a  channel.  The  energy 
consumed  in  transmitting  over  channel  n  is  denoted  by  Etx(n). 

We  assume  that  the  secondary  user  only  has  a  finite  number 
L  of  transmission  power  levels  due  to  hardware  and  power  lim¬ 
itations.  According  to  the  fading  condition,  the  secondary  user 
adjusts  its  transmission  power  to  ensure  successful  reception  at 
its  destination.  In  general,  the  better  the  fading  condition,  the 
lower  the  transmission  power  level.  The  transmission  energy 
consumption  Etx(n)  is  a  rv  taking  values  from  a  finite  set  £tx: 

£tx  =  {Sk}k= 1,  0  <  E\  <  . . .  <  £l  <  oo,  (5) 

where  s/;:  is  the  energy  consumed  in  transmitting  at  the  fcth 
power  level.  By  setting  Sl  =  oo,  we  can  include  the  case 
where  the  channel  is  so  badly  faded  that  no  transmission  is 
allowed.  The  distribution  of  the  transmission  energy  consump¬ 
tion  Etx(n)  is  determined  by  the  channel  distribution,  and  is 
denoted  by 

Pn(k)=  Pr {Etx(n)  =  £k},  k  =  1, . . . ,  L,  (6) 

where  Y,k=iPn(k)  =  1. 

Let  E(t)  denote  the  secondary  user’s  residual  energy  at 
the  beginning  of  slot  t.  Note  that  E(t)  is  a  rv  depending  on 
the  fading  conditions  and  the  secondary  user’s  actions  in  all 
previous  slots.  Since  the  transmission  energy  consumption  is 
restricted  to  the  set  £tx,  the  residual  energy  E(t)  belongs  to 

£  =  {E  :  E  =  £0  -^2  Ck£k  —  c3es  -  cpep  >  0; 

l  k=i  (7) 

cs  >  ^2  Cfc,  Cfc,  Cs,  Cp  £  Z;  Ck,cs,  Cp  >  0}  U  {0}, 
k= 1 

where  Ck  is  the  number  of  slots  when  the  secondary  user 
transmits  at  the  fcth  power  level,  cs  is  the  number  of  slots  when 
the  secondary  user  senses  a  channel,  and  cp  is  the  number  of 
slots  when  the  secondary  user  operates  in  the  sleeping  mode. 
Note  that  the  secondary  user  must  sense  the  channel  before 
accessing  it  in  order  to  avoid  collisions  with  the  primary  users. 
We  thus  have  cs  >  J2k=  l  Cfc- 

III.  A  Decision-Theoretic  Framework 

In  this  section,  we  formulate  the  energy-constrained  OSA 
design  as  an  unconstrained  POMDP. 

A.  Sequential  Decision-Making 

We  illustrate  in  Fig.  1  the  sequence  of  operations  in  each 
slot.  At  the  beginning  of  slot  t,  the  SOS  transits  to  S(£)  €  S 
according  to  the  Markovian  primary  traffic  model  Ps(s'|s). 

Sensing  Decision  Based  on  its  knowledge  of  the  SOS  and  its 
local  buffer  state  B(t)  and  residual  energy  E(t),  the  secondary 
user  first  chooses  a  channel  A(t)  to  sense: 

A(t)  €  {0  (sleeping  mode),  1,  . . . ,  N}, 


(8) 


where  Aft)  =  0  represents  the  sleeping  mode. 

Sensing  Observation  If  a  channel  Aft)  =  a  >  0  is  sensed, 
the  secondary  user  observes  the  channel  occupancy  and  fading 
condition.  The  sensing  outcome  is  denoted  by 


0(f)  £  {0  (busy),  1, . . . ,  L},  (9) 


where  0(f)  =0  indicates  that  the  chosen  channel  is  busy,  and 
0(f)  =  k  >  0  indicates  that  the  chosen  channel  is  idle  and  the 
fading  condition  requires  the  secondary  user  to  transmit  at  the 
kth  power  level.  We  assume  perfect  spectrum  sensing.  Hence, 
the  distribution  U(k\s,  a)  of  the  sensing  outcome  0(f)  given 
current  SOS  and  chosen  channel  Aft)  =  a  >  0  is  obtained  as: 

U(k\s,  a)=  Pr{0(f)  =  k  |  S(f)  =  s,  Aft)  =  a} 

_  ipa{k),  if  sa  =  1,  k  f  0,  (10) 

1  1,  if  sa  =  0,  k  =  0. 


Access  Decision  Based  on  the  sensing  outcome  0(f),  the 
secondary  user  determines  whether  to  transmit  over  the  chosen 
channel  Aft)  >  0: 

<I>0(t)  £  {0  (no  access),  1  (access)}.  (11) 

Let  <&(f)=[$o(£)>  $i(i)>  •  •  ■  i  $l(£)]  denote  the  set  of  access 
decisions,  one  for  each  possible  sensing  outcome  0(f)  £ 
{0,  ...,L}.  Clearly,  when  0(f)  =  0  (busy),  the  secondary 
user  should  refrain  from  transmission,  i.e.,  >I>o(f)  =  0.  We 
also  note  that  the  secondary  user  should  not  transmit  (i.e., 
<f>e(f)  =  0)  when  it  does  not  have  enough  energy  to  combat 
the  current  channel  fading  (i.e.,  Eft)  <  es+£e)  or  its  buffer  is 
empty  (i.e.,  Bft )  =  0).  Let  Ac(B(t),  E(t))  denote  the  access 
action  space,  which  includes  all  allowable  access  decisions 
<fr(f)  given  current  buffer  state  Bft)  and  residual  energy  Eft): 

Ac(B(t),E(t))={$  =  [<S>o, . . . ,  <f>z,]  £  (0,  1}L+1  :  $0  =  0; 

=  0  if  E(t )  <  e3  +  £k  or  B(t)  =  0}.  (12) 


Information  Update  At  the  end  of  each  slot,  the  secondary 
user  can  update  its  knowledge  of  the  SOS  by  incorporating  the 
decisions  and  observations  made  in  this  slot  (see  Section  III-B 
for  details).  The  secondary  user’s  local  state  (Bft),  Eft))  also 
changes  due  to  the  packet  arrivals  and  energy  consumption 
in  this  slot.  Specifically,  since  the  packet  arrival  process  is 
assumed  to  be  Poisson,  the  number  of  arrivals  is  iid  across 
slots.  Hence,  the  evolution  of  the  buffer  state  is  a  Markov 
process  whose  transition  probabilities  are  given  by 

Ph{b' \b)=Pr{B(t  +  1)  =  b'\B(t)  =  b,i  packet  was  sent} 

OO 

—  ^  )  Qm.1  [6'=max{6  —  ,  b,b  £  13,  (13) 

m= 0 

where  i  =  0, 1  is  the  number  of  packets  delivered  in  this  slot, 
and  l  is  the  maximum  buffer  size.  The  residual  energy  reduces 
from  E(t )  to 

E(t  +  1)  =  TE(E(t)\A(t),  *(t),  0(f)) 

a  (E(t)-ep,  if  A(t)  =  0,  (14) 

[  E(t )  —  es  ~  l[<r0  =  i]£e  otherwise, 

where  1[  y]  is  an  indicator  function,  l[$e=i]  indicates  whether 
the  secondary  user  has  accessed  the  chosen  channel,  and  eq 
is  the  energy  required  for  a  successful  transmission.  Note 


that  no  observations  and  access  decisions  are  made  when 
the  secondary  user  is  in  the  sleeping  mode.  For  simplicity, 
we  will  write  TE(E(t)\A(t),  3?(f),  0(f))  as  7g;(.E(f)|0)  when 
A(t)  =  0. 

The  updated  SOS  knowledge,  buffer  state  B(t  +  1),  and 
residual  energy  E(t  +  1)  are  then  used  to  make  optimal 
decisions  in  the  next  slot  f  +  1.  The  above  procedure  repeats 
until  the  secondary  user  is  incapable  of  successful  transmission 
under  any  channel  fading  condition,  i.e.,  its  residual  energy 
Eft)  drops  below  the  minimum  energy  required  to  sense  and 
access  a  channel:  es  +  mm£tx  =  es  +  E\. 


B.  A  POMDP  Formulation 


The  sequential  decision-making  process  described  in  Sec¬ 
tion  III-A  can  be  cast  in  the  framework  of  POMDP.  Specif¬ 
ically,  the  system  state  can  be  characterized  by  the  follow¬ 
ing  three  components:  1)  the  SOS  of  the  primary  network 
S(f)  £  iS;  2)  the  buffer  state  Bft)  £  B  of  the  secondary 
user;  and  3)  the  residual  energy  Eft)  £  £  of  the  secondary 
user2.  While  the  buffer  state  and  the  residual  energy  are  fully 
observable  to  the  secondary  user,  the  current  SOS  cannot  be 
directly  observed  due  to  partial  spectrum  monitoring.  We  thus 
have  a  POMDP  with  composite  system  state  space  S: 

§={(S,B,E)  :  S  dS,B  &  B,E &£},  (15) 

where  S,B,£  are  defined  in  (1),  (4),  and  (7)  respectively. 


Sufficient  Statistics  At  the  beginning  of  each  slot  f,  the 
secondary  user’s  knowledge  of  the  SOS  is  provided  by  its 
decision  and  observation  history3  F(f)={A(r),  ©(t)}^!^.  As 
shown  in  [9],  the  statistical  information  on  the  SOS  can 
be  encapsulated  in  a  belief  vector  A(f)={As(f)}se<s,  where 
As(f)  £  [0,1]  and  X}sg5As( f)  =  1.  Each  element  As(f) 
represents  the  conditional  probability  (given  the  decision  and 
observation  history)  that  the  SOS  is  s  at  the  beginning  of  this 
slot  prior  to  the  state  transition,  i.e., 

As(f)=Pr{S(f-  1)  =  s|Y(f)}.  (16) 


The  belief  vector  can  be  updated  at  the  end  of  slot  f  by 
incorporating  the  sensing  decision  Aft)  and  the  observation 
0(f)  in  this  slot.  Specifically,  applying  Bayes  rule,  we  obtain 
the  updated  belief  vector  A(f  +  l)={As(f  +  l)}seS  as 
A(f  +  1)  =  TA(A(t)\A(t) ,  0(f)),  where 
As(f  +  1)  = 


'ES'  As/(f)Ps(s|s'), 

ES'  AS'  (t)Ps(s\s')U(k\s,a) 

.  Ess'  Asft)Ps(s\s')U(k\s,a) 


if  Aft)  =  0, 
otherwise. 


(17) 


For  simplicity,  we  will  write  T\(A(t)\A(t) ,  ©(£))  as 
7a(A(£))|0)  when  A{t)  =  0. 

The  belief  vector  A  (t)  together  with  the  fully  observable 
buffer  state  B(t )  and  residual  energy  E(t)  are  thus  the 


2 If  packet  arrivals  are  modeled  as  an  MMPP,  then  the  system  state  should 
also  include  the  state  of  the  underlying  MMPP  in  addition  to  these  three 
components  (S(t),  B{t ),  E(t )). 

3 Since  we  have  assumed  perfect  spectrum  sensing,  the  current  SOS  in¬ 
formation  provided  by  the  secondary  user’s  access  decisions  is  contained 
in  the  sensing  outcome.  The  incorporation  of  the  sensing  decisions  and  the 
observations  suffices. 


sufficient  statistics  for  making  optimal  sensing  and  access 
decisions.  A  policy  tv  of  the  POMDP  is  given  by  a  sequence 
of  functions:  7r=[7ri,  7r2, . . .],  where  each  function  nt  maps 
from  the  current  information  state  {A (t),  B(t),  E(t)}  to  a 
sensing  decision  A(t)  and  a  set  of  allowable  access  decisions 
<!>(;()  6  A c(B(t),  E(t))  in  slot  t.  If  7rt  is  identical  for  all  t,  tv 
is  called  a  stationary  policy. 

Reward  and  Objective  A  nature  definition  of  the  reward  is 
the  number  of  bits  delivered  by  the  secondary  user  in  a  slot, 
which  is  assumed  to  be  proportional  to  the  channel  bandwidth. 
Specifically,  we  define  the  immediate  reward  R^b'e^q^)  as 

^S,k,e  W  =  l[A(t)>0]l[3>(t)gAc(B(t),E(t)),*e(i)  =  l]-®a- 


Note  that  l[A(t)>o]  =  1  iff  the  secondary  user  has  sensed 
a  channel,  and  l[*(t)eAc(B(t),E(t)),$e(t)=i]  =  1  iff  the  sec¬ 
ondary  user  has  successfully  transmitted  a  packet. 

As  noted  in  Section  III-A,  the  POMDP  terminates,  i.e., 
no  reward  will  be  accumulated,  once  the  residual  energy 
E(t)  drops  below  es  +  E\.  Hence,  the  total  expected  reward 
represents  the  total  expected  number  of  bits  delivered  by  the 
secondary  user  during  its  battery  lifetime.  The  objective  of  the 
POMDP  can  thus  be  written  as 


tv*  =  argmaxEjr 


E 


R(B,'E,e (*) 


A(1),P(1)  =£0 


(19) 


where  A(l)  is  the  initial  belief  vector,  which  can  be  set  to 
the  stationary  distribution  of  the  SOS  if  no  information  on  the 
initial  SOS  is  available. 


IV.  Optimal  Energy-Constrained  OSA  Design 

In  this  section,  we  derive  recursive  formulas  for  calculating 
the  optimal  policies  of  the  POMDP  given  in  (19).  We  also 
develop  structural  results  for  an  efficient  calculation. 

A.  Stationary  Optimal  Policy 

Stationary  policies  are  usually  preferred  due  to  reduced 
memory  requirements  and  low  complexity  in  implementation. 
We  show  that  the  POMDP  given  in  (19)  has  a  stationary 
optimal  policy. 

Proposition  1:  There  exist  stationary  optimal  sensing  and 
access  policies  for  optimal  energy-constrained  OSA  design. 

Proof:  The  proof  is  based  on  the  fact  that  the  POMDP 
given  in  (19)  terminates  in  a  finite  but  random  stopping  time. 
See  [10]  for  details.  □ 

Proposition  1  enables  us  to  focus  on  stationary  policies 
without  losing  optimality.  For  brevity,  we  omit  the  time  index 
in  subsequent  sections. 

B.  Optimality  Equation 

The  next  step  to  solving  (19)  is  to  express  the  objective 
explicitly  as  a  function  of  the  information  state  and  the 
actions.  Given  current  information  state  (A,  B,  E ),  we  let 
Q(A,  B,  E\0)  and  Q(A,  B,  E\A,  <h)  be  the  maximum  ex¬ 
pected  total  reward  that  can  be  obtained  by  taking  actions 
A  =  0  and  {A  >  0,  €  A (B,E)},  respectively.  The  value 

function  V(A,  B,  E ),  defined  as  the  maximum  expected  total 


reward  that  can  be  accumulated  starting  from  information  state 
(A,  B ,  E),  can  be  written  in  terms  of  the  O-functions: 

V(A,  B,  E )  =  max{Q(A,  B,  P|0),  ^  max  Q(A,  B,  E\A ,  4?)}, 

$£A  (B,E) 

V(A,B,E)  =  0,  iiE<es  +  ei.  (20) 

We  derive  below  iterative  formulas  for  calculating  the 
value  function  and  the  Q-functions.  In  the  sleeping  mode 
A  =  0,  no  immediate  reward  will  be  obtained.  Hence,  the 
maximum  expected  total  reward  ()(A.  B.  E\ 0)  is  given  by  the 
future  reward  V(A ',B',E'),  where  {A ',B',E'}  represents 
the  updated  information  state.  Specifically,  we  obtain  that 

Q(A,B,E\0)  =  pb(B'\B)V(Ta(A\0),B',Te(E\0)),  (21) 

s'eB 

where  Pg(P'|P)  governs  the  transition  of  the  buffer  state  and 
is  given  by  (4),  the  updated  belief  vector  7a  ( A  (0)  and  residual 
energy  Te(E 1 0)  are  given  by  (17)  and  (14),  respectively. 

In  the  sensing  mode  A  >  0,  the  maximum  expected  total 

reward  Q(A,B,E\A,&)  consists  of  two  parts:  the  imme- 

(A 

diate  reward  RB  e  e  defined  as  (18)  and  the  future  reward 
V (A',  B' ,  E').  Averaging  over  all  possible  SOS,  observations, 
and  packet  arrivals,  we  obtain  that 

Q(A,B,E\A,<f>) 

=  E  As,Ps(s|S')^f/(fc|S,A)f 

s,s'es  fceo  (22) 

+  E  Pp{B'\B)V{TK(A\A,k),B',TE{E\A,*,k))]  . 

B'eB 

where  P|e(S'|B),  Ta(A|A,0),  and  Te(E\A,&,G)  are 
given  in  (4),  (17)  and  (7),  respectively. 

Using  (20)  -  (22),  we  can  solve  the  value  function  and  the 
O-fu actions  recursively  in  an  increasing  order  of  the  residual 
energy  E.  The  optimal  sensing  and  access  decisions  are  then 
given  by  the  maximizers  of  (20).  Algorithms  for  solving 
POMDPs  exist  in  the  literature  [9]  and  are  applicable  here. 

C.  Monotonicity  Results  on  Optimal  Design 

While  powerful  in  problem  modeling,  POMDPs  are  gen¬ 
erally  computationally  expensive.  Structural  results  are  thus 
desirable  since  they  can  provide  insights  into  the  underlying 
problem  and  accelerate  computations  [11].  By  exploiting  the 
rich  structure  of  the  energy-constrained  OSA  problem,  we 
develop  monotonicity  results  for  the  optimal  sensing  and 
access  policies  in  Propositions  2  and  3. 

Proposition  2:  Threshold  Optimal  Sensing  Policy 
Consider  the  single-channel  (N  =  1)  and  single-buffer 
(1  =  1)  case.  The  optimal  decision  on  whether  to  sense 
is  a  threshold  policy  in  terms  of  the  conditional  probability 
that  the  channel  is  available.  Specifically,  given  buffer  state 
B  =  1  and  residual  energy  E,  there  exists  a  threshold 
rth  e  [min{Ps(l|0),Ps(l|l)},max{Ps(l|0),Ps(l|l)}]  such 
that  the  optimal  sensing  decision  A*  is  given  by 

A.  fl  if  AoPsWO)  +  *iPs(m  >rth  (23) 

I  0  otherwise, 

where  AoPs(l|0)  +  AiPs(l|l)  is  the  probability  that  the 
channel  is  available  given  current  belief  vector  A  =  [Aq,  Ai], 


Proof:  See  [10].  □ 

Recall  that  a  stationary  sensing  policy  is  given  by  a  func¬ 
tion  that  specifies  a  sensing  decision  A  for  each  possible 
information  state  {A,  B,  E}  (or  equivalently  {Ai,  B,  E}  since 
Ao  =  1  —  Ai  when  N  =  1).  Proposition  2  indicates  that  the 
optimal  sensing  policy  can  also  be  represented  by  a  function 
mapping  from  the  secondary  user’s  local  state  ( B ,  E)  to  a 
threshold  rth  on  the  sensing  decisions.  Since  the  threshold 
rth  e  [min{Ps(l|0),Ps(l|l)},max{Ps(l|0),Ps(l|l)}]  be¬ 
longs  to  a  subset  of  the  belief  space  Ai  £  [0,1],  the  search 
for  the  optimal  threshold  rth  is  less  complex  than  finding  the 
optimal  decision  for  each  belief  vector. 

Proposition  3:  Threshold  Optimal  Access  Policy 
For  a  given  sensing  action  A  >  0,  the  optimal  access 
decision  is  non-decreasing  in  the  channel  fading  condition. 
Specifically,  given  belief  vector  A  and  residual  energy  E, 
there  exists  a  threshold  kth  G  {1, . . . ,  L}  such  that  the  optimal 


access  decision  $ 


by 

1  if  k  <  kth 
0  otherwise. 


(24) 


Furthermore,  the  threshold  kth  is  independent  of  the  belief 
vector  in  the  single-channel  case  (N  =  1). 

Proof:  See  [10].  □ 

Proposition  3  extends  [4]  by  considering  the  buffer  state  in 
the  design  of  energy-constrained  OSA.  It  enables  us  to  reduce 
the  access  action  space  A C(B,E)  to 


A C(B,  E)  =  {$  :  $0  =  0;  1  >  <E>i  >  . . .  >  <f>z,  >  0; 

<f>fe  =  0  if  E(t)  <es  +£k  or  B(t)  =  0}. 

Hence,  the  size  of  the  access  action  space  is  reduced  from 
exponential  2L  as  given  in  (12)  to  linear  L  in  the  number  of 
power  levels,  leading  to  a  more  efficient  search  for  the  optimal 
access  policy. 

Furthermore,  Proposition  3  indicates  that  the  optimal  access 
policy  is  independent  of  the  belief  vector  when  N  =  1.  That 
is,  the  optimal  access  policy  can  be  specified  by  a  function 
mapping  from  the  secondary  user’s  local  state  ( B,E )  to  a 
threshold  kth  for  the  access  decisions.  Since  there  are  only 
finitely  many  local  states  (P,  E),  the  complexity  of  calculating 
the  optimal  access  policy  can  be  significantly  reduced. 


receiver’s  knowledge  of  the  buffer  state  B  for  decision-making 
in  each  slot.  Meanwhile,  the  transmitter  should  inform  the 
receiver  of  its  true  buffer  state  B  so  that  the  receiver  can 
update  its  knowledge.  We  propose  the  use  of  the  request-to- 
send  (RTS)  and  clear-to-send  (CTS)  messages  to  synchronize 
the  buffer  states  at  the  transmitter  and  the  receiver.  Specifically, 
the  transmitter  piggybacks  its  true  buffer  state  B  to  every 
RTS  message  in  the  opportunity  identifying  stage  (see  [3], 
[4]  for  details).  The  receiver  will  confirm  the  reception  of  the 
buffer  state  in  its  clear-to-send  (CTS)  message.  The  buffer 
state  used  for  decision-making  is  then  updated  B  =  B  at  both 
the  transmitter  and  the  receiver  after  the  successful  RTS-CTS 
exchange.  In  the  case  when  the  transmitter  fails  to  update  the 
receiver’s  knowledge  of  the  buffer  state  B  for  a  long  period  of 
time,  we  can  reset  the  buffer  state  B  used  for  decision-making 
according  to  the  transmitter’s  traffic  statistics. 

V.  Numerical  Results 

In  this  section,  we  provide  numerical  results  to  study  the 
impact  of  the  secondary  user’s  traffic  statistics  A  and  residual 
energy  E  on  the  optimal  energy-constrained  OSA  design. 
In  all  figures,  the  optimal  sensing  and  access  decisions  are 
determined  by  solving  (20)  recursively  for  the  information 
state  {A,  B ,  E}  of  interest.  We  assume  that  the  secondary  user 
has  a  single-size  buffer  i.e.,  1  =  1. 

A.  Optimal  Decisions  for  Non-Empty  Buffer 


Fig.  2.  Optimal  thresholds  rt  t,  for  making  sensing  decisions  A*  when 
the  buffer  is  non-empty.  es  =  0.5,  ep  =  0.1.  £tx  =  {1,2,3, 4}, 
Ml).Pn(2),p„(3),p„(4)]  =  [0.2,  0.3,  0.3, 0.2], 


D.  Distributed  Implementation 

As  seen  from  (20),  the  information  state  {A,  B,  E}  governs 
the  channel  selection.  Hence,  to  ensure  synchronous  hopping 
in  the  spectrum  without  introducing  extra  control  message 
exchange,  the  secondary  transmitter  and  its  desired  receiver 
must  maintain  the  same  information  state  in  each  slot.  In  [4], 
we  have  described  how  to  achieve  synchronous  update  of  the 
belief  vector  A  and  the  residual  energy  E  at  the  transmitter  and 
the  receiver.  Below  we  briefly  comment  on  the  synchronous 
update  of  the  buffer  state  B  in  an  ad  hoc  secondary  network 
where  there  is  no  central  coordinator  or  dedicated  communica¬ 
tion/control  channel.  For  a  detailed  description  of  distributed 
implementation,  readers  are  referred  to  [3]. 

Due  to  the  random  packet  arrival  process,  the  receiver  does 
not  know  the  exact  buffer  state  B  of  the  transmitter.  Hence, 
to  ensure  synchronous  hopping,  the  transmitter  should  use  the 


We  first  consider  the  case  where  the  secondary  user’s  buffer 
is  non-empty.  We  consider  the  single-channel  case  N  =  1 
in  which  the  SOS  transition  is  characterized  by  P${i\j)  = 
Pr{S(f  +  1)  =  i|S(f)  =  j},  i,j  =  0, 1.  The  optimal  sensing 
and  access  policies  are  thus  given  by  the  thresholds  rth  and 
kth  as  stated  in  Propositions  2  and  3. 

In  Fig.  2,  we  plot  the  optimal  sensing  threshold  rth  as  a 
function  of  the  residual  energy  E  for  different  packet  arrival 
rates  A.  In  the  upper  plot,  we  consider  the  cases  where 
P’s ( 1 1 1 )  =  0.7  and  Ps(l|0)  =  0.3,  i.e.,  the  channel  occupancy 
state  remains  unchanged  with  a  large  probability.  The  opposite 
case  where  Ps(l|l)  =  0.3  and  Ps(l|0)  =  0.7  is  considered 
in  the  lower  plot.  We  see  that  when  the  residual  energy  E 
is  small,  the  optimal  threshold  rth  is  highly  dependent  on 
the  packet  arrival  rate  A.  As  residual  energy  E  increases,  the 
impact  of  A  and  E  on  the  optimal  threshold  rth  diminishes. 


We  notice  that  the  optimal  thresholds  rth  for  different  packet 
arrival  rates  converge  to  a  common  steady  value  when  the 
residual  energy  E  is  large. 


Fig.  3.  Optimal  thresholds  for  making  access  decisions  when  the 
buffer  is  non-empty.  Ps(l|l)  =  0.7,  Ps(l|0)  =  0.3,  ep  =  0.1,  Etx  = 
{1,2,  3,  4},  [p„( l),pn(2),p„(3),p„(4)]  =  [0.2,  0.3, 0.3,  0.2], 


In  Fig.  3,  we  plot  the  optimal  access  threshold  kth  for 
different  packet  arrival  rates  A.  As  expected,  the  optimal 
threshold  kth  increases  with  the  sensing  energy  consumption 
es  (see  [4]  for  explanation).  Similar  to  the  optimal  sensing 
thresholds  rth ,  the  optimal  access  thresholds  kth  for  different 
packet  arrival  rates  A  may  differ  from  each  other  when  the 
residual  energy  E  is  small  but  a  common  steady  value  will  be 
reached  when  E  is  large. 

Combining  Figs.  2  and  3,  we  see  that  the  impact  of  the 
residual  energy  E  and  the  traffic  statistics  A  on  the  optimal 
sensing  and  access  decisions  is  negligible  when  the  residual 
energy  E  is  sufficiently  large.  This  observation  suggests  a 
complexity-reduced  OSA  strategy.  Specifically,  the  secondary 
user  only  needs  to  calculate  and  store  the  optimal  policies  for 
small  residual  energies  E  <  E*.  When  E  >  E* ,  the  secondary 
user  can  simply  adopt  the  optimal  decisions  for  E  =  E*. 


B.  Optimal  Decisions  for  Empty  Buffer 


We  note  that  even  if  the  buffer  is  empty,  the  secondary  user 
may  want  to  sense  a  channel  in  order  to  gain  information  on 
the  SOS  for  future  use.  Next,  we  study  the  optimal  decision 
l[A*>o]  on  whether  to  sense  when  B  =  0. 

Consider  two  coupled  channels  where  the  SOS  is  either 
S (t)  =  [0, 1]  (i.e.,  only  channel  2  is  idle)  or  S(i)  =  [1,  0]  ( i.e 
only  channel  1  is  idle).  We  assume  that  Pg([l,0]|[0, 1])  = 
Ps([0, 1]  |  [1, 0])  =  a  so  that  the  correlation  between  the 
SOS  in  two  successive  slots  can  be  characterized  by  a  single 
parameter  p  =  1  —  2a.  Extensive  numerical  results  show 
that  the  optimal  decision  l[yL*>o]  on  whether  to  sense  is  non¬ 
decreasing  in  the  SOS  correlation  \p\.  Specifically,  given  the 
secondary  user’s  residual  energy  E,  there  exists  a  threshold 
Pth  £  [0, 1]  such  that 


1[A*>0] 


1,  if  \p\  >  Pth, 
0,  otherwise. 


(26) 


In  Fig.  4,  we  plot  the  threshold  pth  on  the  SOS  correlation 
as  a  function  of  the  residual  energy  E  for  different  packet 
arrival  rates  A.  We  see  that  the  threshold  pth  decreases  with 
A.  Intuitively,  when  A  is  large,  there  is  a  high  probability  that 
packets  will  arrive  in  this  slot,  and  hence  the  secondary  user 


Fig.  4.  Thresholds  pth  on  the  SOS  correlation  for  making  optimal  sensing 
decisions  1m»>o]  when  the  buffer  is  empty.  The  initial  belief  vector  A(l)  is 
given  by  the  stationary  distribution  of  the  underlying  Markov  process.  ev  = 
0.1,  es  =  0.2,  Etx  =  {1,2}.  [pn(l),p„(2)]  =  [0.6,  0.4], 

should  be  more  active  in  collecting  information  on  the  SOS 
for  better  channel  selection  in  the  next  slot.  We  also  observe 
that  the  threshold  pth  increases  with  the  residual  energy  E. 

VI.  Conclusion 

Within  the  framework  of  POMDP,  we  incorporated  the 
bursty  traffic  of  secondary  users  in  the  design  of  energy- 
constrained  OSA.  We  developed  monotonicity  results  on  the 
optimal  sensing  and  access  policies  for  efficient  computation. 
Numerical  results  revealed  that  the  impact  of  the  secondary 
user’s  traffic  statistics  and  residual  energy  on  the  optimal 
sensing  and  access  decisions  diminishes  when  the  residual 
energy  is  large. 
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